Data-intensive File Systems for Internet Services: A Rose by Any Other Name... (CMU-PDL-08-114)

نویسندگان

Wittawat Tantisiriroj

Swapnil Patil

Garth Gibson

چکیده

Data-intensive distributed file systems are emerging as a key component of large scale Internet services and cloud computing platforms. They are designed from the ground up and are tuned for specific application workloads. Leading examples, such as the Google File System, Hadoop distributed file system (HDFS) and Amazon S3, are defining this new purpose-built paradigm. It is tempting to classify file systems for large clusters into two disjoint categories, those for Internet services and those for high performance computing. In this paper we compare and contrast parallel file systems, developed for high performance computing, and data-intensive distributed file systems, developed for Internet services. Using PVFS as a representative for parallel file systems and HDFS as a representative for Internet services file systems, we configure a parallel file system into a data-intensive Internet services stack, Hadoop, and test performance with microbenchmarks and macrobenchmarks running on a 4,000 core Internet services cluster, Yahoo!’s M45. Once a number of configuration issues such as stripe unit sizes and application buffering sizes are dealt with, issues of replication, data layout and data-guided function shipping are found to be different, but supportable in parallel file systems. Performance of Hadoop applications storing data in an appropriately configured PVFS are comparable to those using a purpose built HDFS.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DiskReduce: RAID for Data-Intensive Scalable Computing (CMU-PDL-09-112)

Data-intensive file systems, developed for Internet services and popular in cloud computing, provide high reliability and availability by replicating data, typically three copies of everything. Alternatively high performance computing, which has comparable scale, and smaller scale enterprise storage systems get similar tolerance for multiple failures from lower overhead erasure encoding, or RAI...

متن کامل

Data - intensive file systems for Internet services : A rose by any other

متن کامل

Data - intensive file systems for Internet services : A rose

متن کامل

An Informal Publication from Academia's Premiere

Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-08-110, August 2008. Traditionally file system designs have envisioned directories as a means of organizing files for human viewing; that is, directories typically contain a few tens to thousands of files. Users of large, fast file systems have begun to put millions of files into single directories, for example, as simple dat...

متن کامل

File System Virtual Appliances: Third-party File System Implementations without the Pain (CMU-PDL-08-106)

File system virtual appliances (FSVAs) address a major headache faced by third-party FS developers: OS version compatibility. By packaging their FS implementation in a VM, separate from the VM that runs user applications, they can avoid the need to provide an FS port for every kernel version and OS distribution. A small FS-agnostic proxy, maintained by the core OS developers, connects the FSVA ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Data-intensive File Systems for Internet Services: A Rose by Any Other Name... (CMU-PDL-08-114)

نویسندگان

چکیده

منابع مشابه

DiskReduce: RAID for Data-Intensive Scalable Computing (CMU-PDL-09-112)

Data - intensive file systems for Internet services : A rose by any other

Data - intensive file systems for Internet services : A rose

An Informal Publication from Academia's Premiere

File System Virtual Appliances: Third-party File System Implementations without the Pain (CMU-PDL-08-106)

عنوان ژورنال:

اشتراک گذاری